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The focus of this paper is on trend estimation for a general state- 
space model Yt = fit + Bt, where the dth difference of the trend {fJ-t} 
is assumed to be i.i.d., and the error sequence {et} is assumed to be 
a mean zero stationary process. A fairly precise asymptotic expres- 
sion of the mean square error is derived for the estimator obtained 
by penalizing the dth order differences. Optimal rate of convergence 
is obtained, and it is shown to be "asymptotically equivalent" to a 
nonparametric estimator of a fixed trend model of smoothness of or- 
der d — 0.5. The results of this paper show that the optimal rate of 
convergence for the stochastic and nonstochastic cases are different. 
A criterion for selecting the penalty parameter and degree of differ- 
ence d is given, along with an application to the global temperature 
data, which shows that a longer term history has nonlinearities that 
are important to take into consideration. 

1. Introduction. Trend estimation for time series data lias a long history, 
and the literature, understandably, is quite vast. The basic statistical model 
and the estimation problem are quite easy to describe. The observed series 
{Yt : t = 1, . . . , n} is modeled as 

(1.1) Yt = fit + et, 

where the error series {et} is assumed to be a mean zero stationary process. 
In some cases, the error series may have variances that change with time, 
but we will not worry about that issue here. The goal is to estimate the 
trend {^t} on the basis of the observed data. In the literature, two types of 
structures of the trend are assumed: fixed and stochastic. Asymptotic anal- 
ysis of the estimator of the random trend model (a version of the state-space 
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model) is the focus of this paper. We derive the expression for the asymp- 
totic mean square error of the trend estimate obtained by penahzing finite 
differences and then obtain its rate of convergence. The main asymptotic re- 
sults presented in this paper are not in terms of upper bounds, rather they 
are asymptotic expressions, which are correct up to first order with bounds 
on the second order terms. 

In the fixed trend model, it is usually assumed that the trend is of the 
form fit = f^it/^)^ where /i is an unknown function (presumably smooth) 
on the interval [0,1]. Various methods, such as kernel [Altman (1990) and 
Truong (1991)], local polynomial [Beran and Feng (2001)], spline [Burman 
(1991)] and wavelets [Johnstone and Silverman (1997)], have been employed 
in order to estimate the trend. Asymptotic properties of such methods, rates 
of convergence and issues on smoothing parameter selection have been well 
investigated by many authors [see, e.g., Eubank (1988), Fan and Yao (2003), 
Tran et al. (1996) and Robinson (1997)]. The literature is vast, and the 
recent book by Fan and Yao is a good source on this topic and the relevant 
references. 

The stochastic trend models are quite popular in the time series literature 
[see Chapter 9 in Box, Jenkins and Reinsel (1994), Chapter 3 in Durbin and 
Koopman (2001), Harvey (1991) and Chapter 4 in Shumway and Stoffer 
(2000)]. While deterministic trends may sometimes have simpler expressions, 
it is the belief of many time series analysts that a stochastic trend model 
is more realistic in real applications. A discussion on deterministic versus 
stochastic trend can be found in Chapter 4.1 in the book by Box, Jenkins 
and Reinsel (1994). For the random trend model, it is assumed that the dth 
order differences 



are mean zero i.i.d. variables with variance o"^, where d > may or may 
not be an integer. The usual application in the literature assumes that d is 
a known positive integer. Independence of {V^fit} is not really necessary, 
stationarity is enough (see Remark 1 in Section 2). In this paper, we restrict 
our attention to the case when d is an integer. In a forthcoming paper, we 
will deal with the general case d > in detail. An alternative representation 
for the random trend is 



where /?j's, the coefficients of the polynomial, can be taken to be fixed or 
random. Chan and Palma (1998) have investigated finite approximations 
to the log-likelihood for state-space models in the long memory (fractional 
difference) case. State-space models have been quite popular among many 
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statisticians, and efficient algorithms such as EM and MCMC have been 
developed for estimation of the trend [see, e.g., Shumway and Stoffer (2000) 
and Durbin and Koopman (2001)]. However, not much is known about the 
asymptotic properties of the estimators, even though there is a strong par- 
allel with the method of smoothing splines [see, e.g., Wahba (1978), Wecker 
and Ansley (1983) and Kohn, Ansley and Wong (1992)]. 
Assuming that {et} are i.i.d. 7V(0,cj2) and {VVt} 

are i.i.d. N{0,ax,), 

we note that (1.1) and (1.2a) define a special case of the state space model, 
where (1.1) is the observation equation and (1.2) is the state equation. If the 
errors are not i.i.d. but stationary, a second state equation defining {et} as a 
stationary autoregressive process can be added. For the i.i.d. error case, the 
estimator of the trend is obtained by maximizing the Gaussian likelihood; 
that is, by minimizing 

(1-3) E (Yt-fitf + i^ E 

l<t<n d+l<t<n 

with respect to /x = (/ii, . . . jfin)', where u = a^/a'^. The estimate, for fixed v 
and d, is given by fit = E{fit\Yi, . . . , Yn), which can also be computed as the 
Kalman smoothers. Note that, restricting the class to linear estimators, the 
above minimization problem is still valid without the Gaussian assumptions 
on {e^} and {jt}, and the resulting estimator in such a case can be described 
as the best linear unbiased predictor (BLUP) of {fJ,t} [Kimeldorf and Wahba 
(1970)]. 

Wahba (1978) established a connection to spline smoothing. Suppose that 
the trend is given by 

t 



(1.4) nt= E Ojt^ + T I {t-uf-^dW{u 

0<i<d-l •'^ 



0<j<d 

where is a standard Brownian motion and = {9q, . . . , Od-i)' is multivari- 
ate normal N^^a, kl). Assuming a diffuse prior for 9, that is, if A; — > oo, then 
minimizing 

2 

. du 

' lr\ I Htl^ I 

l<t<n 



with respect to ^i, . . . ,fj,n leads to the estimator fit = E{fit\Yi, . . . , Yn), which 
is precisely the same estimator as one obtains for the state-space model [see 
Wahba (1978) and Green and Silverman (1994) for details]. In other words, 
under the diffuse prior model of (1.4), the smoothing spline estimate of //^ 
obtained by minimizing (1.5) coincides with the estimator obtained by a 
criterion that involves penalizing finite differences. However, the properties 
of the latter estimator are still not well understood. 
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As mentioned in the first paragraph, this paper is devoted to the asymp- 
totic study of the estimator jl obtained by minimizing (1.3) (and its weighted 
least squares variant). Specifically, we obtain an asymptotic expression of the 
mean square error under fairly general assumptions. We only assume that 
{et} is mean zero stationary whose spectral density at zero is positive and 
that {VVt} 

are i.i.d. Note that no assumption of Gaussianity is needed for 
our results to hold. We show that the smallest mean square error is constant 
times 7T,2(rf-o.5)/(2d) ^ This rate of convergence is the same as one obtains for 
the fixed trend case when the order of smoothness is d — 0.5. It may be worth- 
while to point out that the rates for the stochastic and nonstochastic cass 
are different. For the nonstochastic case, the rate of convergence would be 
of order 7j,2rf/(2rf+i)_ penalized least squares method proposed here does 
not require the knowledge of whether the underlying trend is deterministic 
or stochastic. So, if the underlying trend is stochastic, but one mistakes it 
be deterministic and applies the method proposed here, all the results given 
in this paper remain valid. However, if the underlying trend is determinis- 
tic, then it is reasonable to believe (following the theory of nonparametric 
function estimation) that the optimal rate associated with our procedure is 
of order n'"^'^^'^+^\ 

In Section 2, we state the main results where the asymptotic expressions 
for the mean square error of the trend estimates for the state-space model 
are obtained. We discuss two types of estimates: ordinary least squares and 
the weighted least squares. In Section 3, we present a criterion for estimating 
the penalty parameter v and order of difference d along with a numerical 
example. Finally in the Appendix we present the proofs of the results of 
Section 2. The proofs of the results depend on the properties of Toeplitz, 
Hankel and circulant matrices. 

2. The main results. We now take up the issue of trend estimation in a 
state-space model. Two methods will be discussed: first, the ordinary least 
squares and, second, in Section 2.1, the weighted least squares. Clearly, these 
two methods yield the same estimate when the error sequence is assumed 
to be i.i.d. Consider the time series = /Uf + e*, t = 1, . . . , n, where is the 
trend and £t is a mean zero stationary process. We will assume that the dth 
difference of the trend is i.i.d. In the literature, it is usually assumed that the 
errors £t are i.i.d. Gaussian and the dth. differences of the trend V^/ij = 7t 
are also i.i.d. Gaussian variables. In such the trend can be estimated 

by minimizing the negative of the log-likelihood; that is, by minimizing (1.3) 
with respect to /it's, (T^ and a^. There are a few points to be noted here. The 
assumption of normality can be dispensed with, and, in such a case, we can 
treat this as a problem of obtaining the best prediction of linear predictor 
(BLUP) of the random effects 7j's for the mixed linear model. Throughout 
this paper, the error sequence {et} is assumed to be stationary. So, instead 
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of minimizing the penalized ordinary least squares criterion given in (1.3), 
we should perhaps minimize a weighted least squares criterion, and this is 
taken up in Section 2.1. In this subsection, we still deal with the ordinary 
least squares criterion given in (1.3) in the presence of stationary error series 
{et}- In the next section, we will describe a method for estimating v (and 
the degree of difference d) . 

Before we begin, it is important to note that we would like the aver- 
age signal-to-noise ratio E\\ii\\'^ / {nal) to be bounded between two positive 
constants. This can be guaranteed if the quantity £^||/i|p/n stays bounded 
between two positive constants. This requirement is met if the dth order 
difference 7t = V^/ii has a variance of the form 

(2.1) cT2=r2V"-' 

for some positive constant r. In order to see why this is true, we need to 
rewrite the random trend in (1.2b) as 

^^t= ( W"' + E (-1)*^' it~-n ) = + say. 

0<j<d-l l<j<t ^ ' 

First, let us consider the polynomial part. If the coefficients /Sj's are noran- 
dom and are bounded in absolute values by a constant independent of n, 
then ||//i|p/n is finite. If /3j's are random with finite means and variances 
(which do not depend on n), then is also finite. Now, let us con- 

sider the purely random part [i^t of the trend. By Stirling's approximation, 
(— l)'(~j'^) is approximately equal to (r(d))~^/°'~-^ for a positive integer I not 
small, where F is the usual Gamma function. Then, a fairly straightforward 
calculation will show that i?||/i2|P/'n' is approximately equal to constant 
times when n is not small. 

The statistical model, which we assume to be correct throughout, is 

= + with V^/Uj = 7t, where 

{fft} is mean zero stationary, 

(2.2) 

{7^} are i.i.d. with mean zero and variance jr?'^ ^, 
{et\ and {74} are independent. 

We have discussed, above, that the trend consists of a polynomial part 
li\t and a purely random part [i2t- It is important to point out that the 
purely random part ii2t is a nonnegligible one. This can be seen once we 
note that /Li2t has zero mean and -E(/i2t) approximately equal to a constant 
times {t/nf''^~^ . Thus, the purely random part of the trend is nonnegligible 
except when t is small. 

We will now find a matrix representation of the estimate of /i obtained 
by minimizing (1.3). For d > 0, the summation and difference operators on 
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i?" denoted by Sd and S-d, respectively, are defined as follows: for any x in 




It can be shown that SdS^d = I and Sq = I. Properties of summation and 
difference operators can be found in Burman (2006). Note that the differ- 
ence operator S-d is a lower triangular band matrix with element is 
(— I)*"-' (f-j) , t> j. Let the {n — d) xn matrix obtained by deleting the first 

d rows of S-d be denoted by S-d as follows. Then, with Y = (Yi, . . . jYn)' , 
we can rewrite (1.3) as 

(2.4) SSE{fi, V, d) = \\Y - /if + ui^iS'-dS-df^. 

Hence, minimizing the expression in (2.4) with respect to /Zf's leads to the 
estimate 

(2.5) fi = fi{u) = {I + uS'_aS-d)-'Y. 

The estimate fi given above in (2.5) is rather easy to calculate, even for 
large n, since / + vS-dS-d is a band matrix. It should also be emphasized 
that this matrix representation eliminates the need for deciding the intitial 
values for the Kalman iterations. 

It is possible to construct a pointwise prediction interval for fit under 
Gaussian assumption when the errors {ej} are assumed to be i.i.d., and 
the variances a1 and o"^ are known. In such a case, we can express the 
model SisY = Xj3 -\- + e, where element (i, j) of the matrix X is {t/ny~^, 
j = 1, . . . ,d, Z is a lower triangular matrix of order n whose element (t,j) 
is (— !)*"■' (j~j) . The conditional distribution of /_f given Y is normal with 
mean 

E{fi\Y) =X(3+{I + i^*{Z'Z)'Y\y - XI3) 
and variance-covariance matrix 

D = a^^{I + u*{ZZ')-Y\ 
where can be shown that the estimate given in (2.5) is fj,{i^) = 

X $ + {I + ly* {Z' Z)-^)-^ {Y - X P) , where P is the estimate of (3 obtained by 
the weighted least squares criterion {Y - XI3)'{all + a'^ZZ')-^{Y - XP). 

Since /3 is a -y/re-consistent estimate of the conditional mean of fi given Y 
is well estimated by /i(i^). An approximate (1 — a) 100% prediction interval 
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of fit is given by fit ± ZaDu, where Za is the critical value from the standard 
normal distribution and {Du} are the diagonal elements of the variance- 
covariance D matrix of the conditional distribution of fj, given Y. In practice, 
however, and u* = o'l/a'^ are unknown and have to be estimated from 
the data. In Section 3, we will discuss these issues. The matrix D is not as 
formidable as it looks. It is a banded matrix, since is a banded lower 
triangular matrix whose element {t,j) is given by (—1)*"-' (j^^) , <t — j < d. 

It is not unusual to have time series data with missing observations. If 
a large block of consecutive observations are missing, then nothing can be 
done to estimate the trend during those periods of time. Suppose, for the 
case of simplicity, that all the observations between time periods rii and 
722 are missing, where ni = npi and n2 = np2, and < pi < p2 < 1. Then, 
the methods of this paper can be used to estimate the trend fit for 1 < 
t < 111 and n2 < t < n, and all the results will be valid with appropriate 
modifications. However, the more difficult case is when the obervations are 
sporadically missing. In such a case, the entire trend {/it :t = I, . . . ,n} should 
be estimable. A reasonable approach for estimating the trend would be to 
minimize 

^(Ft - fitf + lyft'S'^dS-d^i 

with respect to fi where J is set of time indices at which the observations are 
available. In such the estimate of fi would be /i = (/ + uS^^S^d) 

where / is a diagonal matrix whose tth diagonal element is 1 or depend- 
ing on whether the observation is available or missing. We have not yet 
investigated this estimate and its properties, and this issue needs further 
research. 

2.1. Asymptotic mean square error of the estimate of the random trend. 
Let us denote the conditional mean of the estimate, given the trend as 
/Ij = E{flt\fii, . . . , /in)- If we view JJ^ — /if as the bias of the estimate fit, then 
we can decompose the mean the square error as in the usual variance-bias 
decomposition 

E\\fL — = E\\fi — 'PW'^ + EWJl — fi\\'^ . 

Theorems given below tell us the asymptotic values of the bias and the 
variance. We will provide an outline of the results here. If we write b = 
jyi/(2'i) g^j^f^ assume that 6 — > and n6 — > oo as n — > oo, then it turns out 
that 

E\\fL - -flf/n = ci(n6)-^[l + 0{{nh)-^) + 0{h)], 
EWfl - /ill Vn = C2h^'^-\l + 0((n6)-i) + 0{h)], 
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where ci and C2 are constants, the expressions for which are to be found in 
Theorems 1 and 2 given below. Note that the results look very much like 
the asymptotic expressions of the variance and bias-square components for 
estimating a regression function using a kernel method with bandwidth b. 
Also, note that the bias looks like that of a function (nonrandom), which is 
d — 1 times differentiable with the (d — l)st derivative satisfying a Lipschitz 
condition of order 0.5. Hence, we can obtain the value of b = b* at which the 
mean square error is minimized and calculate the minimum mean square 
error explicitly. More discussion of the results are given below after the 
theorems have been stated. 

The first result given below is on the asymptotic expression of the vari- 
ance. The proofs of both the theorems need to employ the theories of 
Toeplitz, Hankel and circulant matrices. 

Theorem 1. Assume that the conditions given in (2.2) hold and that 
Y^KjKoo j\Pe{3)\ < oo, where Pe{j) is the covariance of the stationary process 
of lag j for the error process {e*}. Assume that ge{0) > 0, where ge{u) = 
J2~oo<j<oo P^U)^^''^ ^•s ihe spectral density function of the error process. 
Then, assuming v ^ oo and v/n^'^ ^0 as oo, we have 

E\\f, - -Jlf/n = ciz.-i/{2<i) [1 + 0(z."i/(2'^)) + O^u^'^^"^ /n)l 

where ci = ^^(O) Beta(l/(2d), 2 - l/(2(i))/(2d7r). 

The following result gives an asymptotic expression for the bias. 

Theorem 2. Assume that the conditions given in (2.2) hold. Assuming 
that V ^ oo and v/v?'^ — > as n ^ oo, we have 

E\\iJ - /ill Vn = C2(i/i/(2'^V")''"'[l + 0(z.-i/(2d)) ^ o(^i/(2d)/„)]^ 

where C2 =r2Beta(l + l/(2d), 1 - 1 / {2d)) / {2dTr) . 

Remark 1. We have so far assumed that the dth difference {74 = V^/Zf} 
of the trend consists of i.i.d. random variables with mean zero and variance 

= T^/n^'^"^. Are Theorems 1 and 2 valid when {V^fit} are not i.i.d.? 
The answer is yes, if we assume that {7^ = V^fit} is a mean zero stationary 
process with autocovariances Cov{'yt+j,^t) = '^'^/PjU), — c« < j < 00. In such 
a case. Theorem 1 is exactly the same as before. Theorem 2 is also the 
same as before except for the constant that appears in the expression of 
E\\jl — /n. Let g-^ be the spectral density of the process {7t/o"-y} and 
assume that J2i<j<ooj\P-yU)\ < ^ modification of the proof of Theorem 
2 will show 

EWTl - iifin = C2<?-,(0)(i/V(2d) Inf-^ [1 + 0(z.-V(2d)) ^ 0(^1/(2^)/^)], 



ESTIMATION OF STOCHASTIC TREND 



9 



where C2 is the same constant as in Theorem 2. Note that the constant 
involved in the expression of E'H/I — /u|p/n now includes the value of the 
spectral density g-^ at zero. Clearly, for the case when {V^/^t} is stationary, 
all the discussion below about the optimal mean square error in estimating 
the trend remains valid with appropriate constants. 

Remark 2. Note that the mean square error 

D{u) = EU - Mil Vn = [ciz.-i/^^'i) + c2^.i-^/(2d)^-2d+i](^ ^ 

is minimized at 

z.* = n^<'-\2d - 1)-I(ci/C2)(l + o(l)) 
and the smallest mean square error is 

where C3 = ci(c2/ci) V(2rf)2d(2d - . 

Remark 3. We consider, here, the Euclidean distance between the true 
random trend ^ and its estimate fi. Such a distance has been considered 
by many for time-dependent observations [see, e.g., Altman (1990), Burman 
(1991), Johnstone and Silverman (1997) and Truong (1991)]. However, it is 
of interest to consider the distance {jl — ^)'R~^{jl — ji) , where Rg is the nxn 
variance-covariance matrix of the error process {e*}. If the spectral density 
function of this process stays bounded between two positive constants, which 
is the case for the usual ARMA model, then the theory of Toeplitz matrices 
tells us that all the eigenvalues of the matrix i?^ stay between two positive 
constants [Grenander and Szego (1958)]. In such a case, we can find two 
positive constants ki and k2 such that 

kiWfi — /i||^ < {fi — ^1)' R~^[jl — /i) < fc2||/i — /i||^. 

Consequently, all of the rates of convergence results for the Euclidean dis- 
tance ll/i — //|p are also valid when the distance is taken to be {jl — R~^{[i — 

We will conclude this subsection by comparing the optimal mean square 
error as discussed above with the optimal rate of convergence associated 
with nonparametric function estimation problems. A function / on [0, 1] is 
defined to be in the smoothness class p = r + f3, where r is a nonnegative 
integer and 0</3<l, if/isr times differentiable and the rth derivative of 
/ is Lipschitz of order (3. Now, if the trend is a nonrandom function and is 
modeled as fit = n{t/n), and the function fj, is of smoothness class p, then the 
optimal rate of convergence for estimating the trend is given by n"^^/^^^^^^ 
[see Stone (1982), Eubank (1988) and Fan and Yao (2003)]. Theorems 1 and 2 
and the subsequent discussion tell us that, for the state-space model as given 
in (2.2), the optimal rate of convergence is n~^'^^~^^^^'^'^\ This corresponds 
to the rate p= d — 0.5 for the nonrandom case. 
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What is the rate when the order of difference in unknown? We now ex- 
amine the performance of the estimated stochastic trend when the true order 
of difference, which we assume to be do, is unknown, but a dth. order differ- 
encing scheme is being employed to estimate the trend. It may be worthwhile 
to point out that the order of difference d controls the smoothness of the 
estimated trend. Since, in the time series literature, it is assumed that the 
true order of difference is known, we may turn to the literature on nonpara- 
metric function estimation for some guidance. For nonparametric function 
estimation, it has been pointed out by many authors that, from a practical 
point of view, the penalty parameter (or the smoothing parameter in gen- 
eral) is far more important than the order of differencing d employed in the 
estimation procedure [see Beran (2005), Eubank (1988) and Wahba (1990)]. 
For instance, in his analysis of multi-way tables, Beran finds that there is 
not much of a difference in the estimated risk when the order of difference 
is taken to be any integer between 1 and 4. 

However, for the sake of theoretical completeness, we will present the 
results when the true difference order do is unknown, and we are employing 
a dth. order differencing in order to obtain the estimate of the stochastic 
trend. Before we state the result, let us point out that, when d > do, the 
rate remains the same though the constant associated with the rate depends 
on d. However, the constant is the smallest when d = do. When d < do, a 
different rate comes into play, and this rate is the same as in the usual 
nonparametric function estimation [see, e.g., Eubank (1988), Stone (1982) 
and Wahba (1990)]. 

It should be pointed out that Theorem 1 remains valid even when d^do- 
However, Theorem 2 is no longer valid when do. We will write the results 
for the bias part (i.e., analogue of Theorem 2). When d < do, we can only 
obtain a bound for the bias part. But, for the case d > do, we can obtain an 
asymptotic expresssion. 

Theorem 3. Assume that the model as given in (2.2) holds with d re- 
placed by do. The estimate of fi is obtained by minimizing the expression 
given in (1.3): 

(a) When d< do, 

E{\\]2-f,f}/n = 0i{i^/nf'). 

(b) When d> do, 

E{\\J1 - fif}/n = C4(z.i/(2d)/^)2do-i[i + ^(1)]^ 

where C4 = Beta((2(io - l)/(2d),2 - {2do - l)/(2d))/(2d7r). 
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Even though we obtain an upper bound for the case d <do, we beheve it 
is not possible to improve the rate, and that is certainly true when d=l, 
the case for which the exact expressions for the eigenvalues and eigenvectors 
for the matrix S_^S-d are known. Note that the optimal mean square error 
E{\\fi - l-i-\\^}/n is 0(n-2'^/(2rf+i))(-]^ _^ ^j-],)), which is known to be the rate 
for nonparametric function estimation for the nonstochastic case. 

When d>do, Theorem 1 and part (b) of Theorem 3 tell us that the mean 
square error is of the form c{d)n~^'^'^°~^^^^'^'^'^^ when 

c{d) = [5,(0)2°'°-V2(2do - l)(Beta(l/(2d) - 2 - l/{2d)f'^°~^ 

X Beta((2do - l)/(2d),2 - {2do - l)/(2d))]^/^^"'°\ 

It can be shown that c{d) is minimized when d = dQ. In other words, the 
optimal rate of covergence is still n~^'^'^°~^^^^'^'^'^^ as long as d>do, but the 
constant associated with the rate depends on d and the minimum value of 
the constant is acheived at d = do. 

2.2. Weighted least squares estimate of the trend. In this subsection, we 
will discuss a weighted least squares estimator of the trend and its asymp- 
totic properties. Since the arguments needed to prove the results given in 
this subsection are similar to those for Theorems 1 and 2, we will merely 
state the results. We will first discuss the case when the variance-covariance 
matrix of {et-t = l,...,n} is assumed to be known. A weighted least 
squares estimate ^(^^'^^ of /U may be obtained by minimizing 

(2.6) (y_^)'i?-i(y_^) + ^ ^ (vVt)^ 

d+l<t<n 

instead of minimizing the quantity given in (1.3). Clearly, then 

(2.7) /l(-i^) = {R;' + uS'_,S.,r'R-'Y. 

In practice, of course, the matrix R~^ is unknown and has to be estimated 
from data. If i?e is an estimate of R^ , then a practical weighted least squares 
estimate of the trend is given by replacing R^ in (2.7) by R^, and we denote 
the resulting estimator by fi^"^^^^ . We will obtain, below, analogues of The- 
orems 1 and 2 for and also show that the difference /i^'"'^) — ^(™^'^) is 
small in the probabilistic sense under appropriate conditions. 

We will assume that the error process {st} is AR{p) [or ARMA(p, g)]. Us- 
ing a preliminary estimate fL~^ of ^, we can estimate p (via a model selection 
criterion such as AIC or BIC) and the parameters of the error process by 
using the residuals Yj — jlf . Clearly, an estimate of the variance-covariance 
matrix R^ can be obtained from the estimated model of the error process. 
Moreover, the estimated variance-covariance matrix R^ is a ^/n consistent 
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estimate of R^; that is, \\Rs — Re\\ = Op{n~^/'^), where || • || is the usual 
matrix norm (i.e., the maximum of singular values). 

For the following results on the asymptotics of the weighted least squares, 
for any x in i?", we define ||x||^_i to be equal to x'R~^x. 

Theorem 4. Assume that the conditions given in (2.2) holds and that 
the error process {et} is AR(p). Let /I^"'*^) = E'f/I^^'^^l/u] . Then, as u ^ oo 
and V jr?'^ — > 0, the following results are true: 

(a) EiWTt^-'^^ - 7l(-i^) ||^-i]/n = C4Z/-i/(2'^) [1 + 0{u~'/^'''^) + 0{u'/^'''^ /n)], 
where C4 = c/e(0)-i/(2d) Beta{l/2d,2 - l/{2d))/{2dTr); 

(b) i?[||7l(-^^)-/z||^-i]/n 

= C5i/i-^/(2d)^-2d+i[-L ^ 0(i/-i/(2^)) + 0(z.i/(2^) /n)], 

where C5 = T^ge{0)-^/^^'^^ Beta(l + l/{2d), 1 - l/(2(i))/(2d7r); 

(c) The mean square error is given by 

= [C4z.-l/(2<i) + ^^^l-l/(2d)^-2d+l] ^ 0(^-1/^) + 0(z.l/(2^) /n)]. 

Theorem 5. Assume that the conditions for Theorem 3 hold. Moreover, 
assume that the fourth moment of the error process £t is finite and ||-Re — 
R,\\=Op{n-^/^). Then, 

||/l(wls) _ ^|||_,/n - - /i||^-i/n = Opil/n). 

Remark 4. There are a number of consequences that follow from The- 
orems 3 and 4. First, from the asymptotic expression of the mean square 
error for the weighted least squares estimator /I^™'*^) of fi for a known R^, we 
can obtain the optimal rate of convergence. Note that the minimum mean 
square errors, for the weighted least squares estimate /J^™^*^) and the ordinary 
least squares estimate fi given in Section 2.1, differ only in constants. More- 
over, Theorem 4 guarantees that the weighted least squares estimate p,^^^^^ 
for unknown i?^ has the same mean square error as /I^™^*^) in the asymptotic 
sense. 

Remark 5. Even though Theorem 3 is stated for the case when the 
error process {et} follows AR(p), this result is true for any stationary error 
process as long as its spectral density function is bounded away from and 
00. Theorem 4 holds as long as the error process has a finite-dimensional 
model such as AR{p) or an invertible ARMA(p, g). 
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3. Data dependent selection of v and d. In this section, we will discuss 
the issue of selecting the smoothing parameter u and the degree of differences 
d for the ordinary least squares estimate /t given in Section 2.1. Criterion for 
selection of v and d can be developed for the weighted least squares estimate 
^(wis) |-|y following similar arguments given in this section. However, we will 
not address that issue here. The main idea rests on minimizing the expected 
distance between /x and its estimate fi = given by 

D{u,d) = EWfi — ^\\^ /n = E\\jl —Ji\^ /n + E\\ll — /n^ 

where 'Jl = Ji{y) = E[jl{v)\iJi\. Ideally we would like to minimize D with re- 
spect to v (and d). However, it is unknown, and so we try the route of 
estimating D by following the arguments given by Akaike and Mallows. 
Now, the expected value of the residual sum of squares is 

E{SSE{v,d)) = E\\Y - [if 

= + E\\fl - Jlf + EWll - /if - 2 tr((I + uS'^aS-d)~^Re), 

where o"^ = E(ef) and variance-covariance matrix of the error series {et} is 

Re- 

So, an unbiased estimate of D{u,d) is given by 

(3.1) D{v, d) = SSE{u, d)/n + 2 tr ((/ + vS'^^ ^^T^ Re) jn-a^. 

Since the last term in the expression of D does not depend on v (and d), 
we can safely ignore it. Using the same arguments as in Theorem 1, we can 
show that 

tr((/ + z.5'_rf5_rf)-ii?,) = 5^ <7,(7rj7n)/(l + vsii^jln)) + 0(1), 
where is the spectral density of the process {ej}, siu) = (2 — 2costi)'^, and 

/■oo 

C6(d) = 7r-^ / 1/(1 + n2^)du = Beta(l/(2(i),l - l/(2(i))/(2d7r). 

If we can get a reasonable estimate of the g^^ as in the method given 
below, then, by ignoring the term involving in (3.1), we can arrive at the 
following criterion function 

(3.2a) 4>{y,d) = SSE(v)ln-,(\ln)Y,U'^2ln)l(\^vs{Txjln)) or 
(3.2b) (\>{y, d) = SSE{v) /n + 2i/-^/(2rf) (o)c6 (d) . 

The first criterion given in (3.2a) is preferable, since the second one in (3.2b) 
is an approximation to the first when both n and v are large. 

We will now concentrate on how to find a reasonable estimate of the 
spectral density of the error process {et} at zero. Let us assume that the 
error sequence is AR(p), an autoregressive process of order p, where p is 
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unknown and needs to be estimated. Consider the local linear estimator jlf 
of /it on the basis of observations lf_fc, . . . ^Yt+k^ where k is approximately 
equal to y/n/2. lit <k + l, then the estimator is based on Yi, . . . and a 

similar modification is done when t> n — k. Using a selection criterion such 
as AIC or BIC we can select an autoregressive model using the estimated 
error sequence et = Yt — jlf . Let ^^(O) be the estimate of the spectral density 
of the error sequence at zero on the basis of the estimated autoregressive 
process. 

A second approach is to minimize the innovations log likelihood for the 
state space model defined by the observation equation (1.1) and the state 
equations (1.2) and 

(3.3) £t = 4>iet^i + 4>2et-2 + 5t, 

where {5t} are mean zero i.i.d. with variance (5^. Again, AIC or BIC can be 
used to estimate the order p of the autoregressive process. The estimators 
for /U, say /i, are the usual Kalman smoothers, produced as a by-product 
when using the EM algorithm to estimate the unknown parameters (pi, <j)2 
and (t|. The Kalman smoothers also produce the estimated mean square 
error oi E\\jl — fi\\'^ , which can be used to set pointwise uncertainty limits 
for the smoothed trend. 

We have used both methods above to obtain an estimate of the trend of 
the global temperature data, as given in Jones et al. (2000). Figure 1 shows 
the yearly average of land and marine temperature stations beginning in 
1856 and ending in 2000. We have chosen a relatively long time span that 
indicates that the assumption of linearity, often made on the basis of temper- 
ature series beginning in 1900, may not be realistic over the long term. The 
first model selection criterion described above selected an AR(2) model for 
the error process {et} with parameter estimates <j)i = 0.3784, (p = —0.1660, 
(5"| = 0.0096. The selected order of difference and the penalty parameters 
turned out he d = 2 and z) = 219.8. Applying maximum likelihood (the sec- 
ond approach) yielded comparable values = 0.3882, (j)2 = —0.1641 and 
(t| = 0.0095. As a matter of fact, the trend estimates for these methods 
turned out to be indistinguishable. The fitted trend and data are plotted in 
Figure 1, and we note that the estimated trend conforms more to a nonlinear 
function with two periods of relative stable global temperatures and the two 
periods of rather steep increases, the last beginning at about 1975. 

3.1. Simulations. We have done simulations in order to check the suit- 
ability of our criterion for sample sizes n = 100 and n = 300 with different 
signal-to- noise ratios (SNR). We have tried two cases when the true value 
of d is either 1 or 2. So, the model we have tried is 



Yt = fit + et, t = l,...,n, 




Fig. 1. Yearly temperature anamalies (1856-2000) in degrees centigrade relative to the 
1961-1990 mean. Solid lines are the fitted trend and the upper and lower point-wise 95% 
posterior probability points based on assuming normality of p. — and parameters fixed at 
their likelihood estimates. 



where {ej are i.i.d. A^(0,1), /^t = Ei<i<t(-1)* ^{t^^j)lj^ where {7^} are 
i.i.d. N{0,a^). We choose cr^ in such a way that the signal-to-noise ratio is 
either 2, 5 or 9. We have used the criterion given in (3.2a) and (3.2b) in order 
to estimate z/ and d. Since the errors in the simulation are taken to be i.i.d., 
there are many ways to estimate its variance. We have used a fairly simple 
estimate here. The estimate used here is half times the average of the squares 
of the first differences of the observations. We have calculated the ratio 
R = infd^u WP'id, v ) — z>) — where the minimum in the numerator 

is over > and d in {1,2}, and (1^,^) are obtained by minimizing the 
criterion function discussed above. We have calculated the mean, standard 
deviation and median of R for various combinations of d and SNR. All of 
the calculations are based on 400 repeats. How well we can estimate the 
underlying trend depends on the signal to noise ratio. The higher the SNR 
value, the better the estimate. The simulation results given in Table 1 show 
that the estimation methods proposed here work reasoanably well. 
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Table 1 

Simulated values of the performance ratio R 









n = 100 




n = 300 




Mean(SD) 


Median 


Mean(SD) 


Median 


d= 1 


SNR = 


2 


0.9381 (0.0603) 


0.9532 


0.9599 (0.0387) 


0.9698 




SNR = 


5 


0.9411 (0.0488) 


0.9511 


0.9726 (0.0306) 


0.9830 




SNR = 


9 


0.9367 (0.0460) 


0.9435 


0.9728 (0.0285) 


0.9830 


d= 2 


SNR = 


2 


0.7833 (0.1839) 


0.8231 


0.8343 (0.1582) 


0.8768 




SNR = 


5 


0.8175 (0.1531) 


0.8502 


0.8424 (0.1548) 


0.8923 




SNR = 


9 


0.8377 (0.1451) 


0.8735 


0.8696 (0.1335) 


0.9142 



APPENDIX 

We will begin this section with a number of notations and definitions 
which will be used in the proofs. 

Notation 1. For any square matrix A of order n, we will denote its 
singular values by ai{A), . . . ,an{A) and its eigenvalues by Xi{A), . . . , Xn{A) 
(singular values of A are the positive square roots of the eigenvalues of A' A). 

Notation 2. The function 1 — e*", — vr < u < vr, will be dented by so{u) 
and |so('u)P = 2 — 2cos(ti) will be denoted by s{u). 

It is important to note that the dth order finite difference matrix as given 
in (2.3) is a Toeplitz matrix with symbol Sq. The Toeplitz matrices asociated 
with Sq and s'^ will be used often in the proofs. There are a number of 
different matrix norms that will come into play, and we will define them 
here. 

Definition 1. Let crj{A), j = 1, . . . ,n, be the singular values of the 
matrix A. The following three norms are used widely: 

(i) Spectral radius norm. ||A|| = max(Tj(j4); 

(ii) Trace norm. ||^||i = '^i(^); 

(iii) Frobenius norm. ||^||2 = '^|(^)}^^^- 

Definition 2. A matrix T = {{bjk)) is of Toeplitz type if hjk = bj^k- If 
bj-k is given by J^^ exp(i(j — k)u)f{u) du/{2TT), then the function / is called 
the symbol of the Toeplitz matrix and often the notation T{f) is used to 
denote the Toeplitz matrix. The submatrix consisting of the first n rows and 
columns will be denoted by T„(/). 
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Definition 3. A matrix H = {{hjk)) is of Hankel type if hjk is of the 
form hj+k- If bjj^k is given by /^^ exp(i(j + k)u)f{u) duj (27r), then / is cahed 
the symbol of the Hankel matrix and often the notation is used to 

denote the Hankel matrix. The submatrix consisting of the first n rows and 
columns will be denoted by Hn{f)- 

We will now define another special type of matrices called the circulants. 
The circulants can be used to approximate Toeplitz matrices. Let Pq be the 
nxn cyclical permutation matrix whose element {j, fc) is 1 if j — A; = l[modn] 
and otherwise. Then, Pq has the form given below: 



/o 





• 


• • 1\ 


1 





• 


• • 





1 


• 


• • 








• 






• 


■ 1 0/ 



Definition 4. A square matrix C„ = {{bjk)) of order n is called a cir- 
culant if bjk = fej-fc[modn]- If bj's are given by f{u) = - E-r<i<r ^i^*^"' then 
the circulant C„ is said to be generated by the symbol /, and we write C„(/) 
to denote it. If Pq is the cyclical permutation matrix as given above in (A.l), 
then we can express C„(/) =J2~-r<j<rbj^o- 

We will first write down a few important lemmas, the proofs of which 
will be given after those of Theorems 1 and 2. We begin with an interlacing 
theorem due to Weyl [see Theorem 4.3.6 in Horn and Johnson (1985)]. 

Theorem 6. Let A and B he two real symmetric matrices so that A — B 
has rank at most r. Let Xi{A) < ■ ■ ■ < XniA) and Xi{B) < ■ ■ ■ < XniB) be the 
eigenvalues of the matrices A and B. Then: 

(a) Xj{A) < Xj+r{B), j = l,...,n- r, 

(b) Xj{B)<Xj+r{A), j = l,...,n-r. 

The next result on singular values, which is analogous to the previous 
one, follows from the result given on page 423 in Horn and Johnson (1985). 

Theorem 7. Let A and B be two matrices of order nxn, and the 
matrix A — B has rank at most r. Let (Ti{A) > a2{A) > • • • be the singular 
values of A, and, similarly, let ai{B) > a2{B) > ■ ■ he the singular values 
of B . Then: 
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(a) aj+r{A) <aj{B), j = l,...,n-r, 

(b) aj+r{B) <aj{A), j = 1, . . . , n - r. 

The next lemma finds the bounds on the singular values and the trace 
norm of a Hankel matrix. 

Lemma 1. Let H = {(bj^k)) be a nx n real Hankel matrix. Let (Ti{H) > 
■ ■ ■ ^ <^n{H) be the singular values of H . Then: 

(a) aj{H) < Ej+i<t<2n\bt\, 

(b) l|i^l|l=El<,<nk.-(^)l<2El<K2n^l&d- 

The following important lemma tells us how a Toeplitz matrix can be 
approximated by a circulant matrix. 

Lemma 2. Let r„(/) be a Toeplitz matrix with the symbol f{u) = 
J2~N<j<N^j^~^''^ > where N < n/2. Let Cn{f) he the circulant matrix given 
hy J2~N<j<N^j'^o ' where the generator permutation matrix is as given in 
(A.l). For 1 < i < n, consider the vector Cj whose tth element is 
exp(— i27rjt/n), l<t<n. Then, ei,...,en are orthonormal. The fol- 
lowing results hold: 

(a) When bj = b^j, the eigenvalues (unordered) and the corresponding eigen- 
vectors of the circulant Cn{f) are given by /(27rj'/n) with the corre- 
sponding eigenvectors ej; 

(b) rank(C7„(/)-T„(/))<2iV; 

(C) WCnif) - TM)\\l < 2El<i<Ar(j + 

(d) The circulant matrix Cn{f) can written as J2i<j<n fi'^'^J ■ 

The next result compares the sum of squares of the singular values of a 
matrix A to the sum of the singular values of a another matrix B when the 
matrix A — B is of finite rank. The proof is omitted as it an easy consequence 
of the interlacing theorem. 

Lemma 3. Let A and B be two square matrices of order n, and the rank 
of matrix A — B does not depend on n . Then, 

Y: <7]iA)- Y: <7]{B) = 0{l){af{A)+aj{B)). 

We will present a few known results without proofs. These results will be 
useful in our proofs. The first result [Theorem 1.1 in Bottcher and Grudsky 
(2000)] obtains an upper bound of the norm of a Teeplitz matrix in terms 
of the supremum norm of its symbol. 
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Theorem 8. Let Tn{f) be a Toeplitz matrix with symbol f . Let \f\oo be 
the supremum norm of f . Then, 

\\Tn{f)\\<\fW 

The following result tells us that the matrix S'_iiS-d does not differ much 
from the Toeplitz matrix Tn{s'^). 

Lemma 4. All the elements of the matrix S'^^S^a ~ Tn.{s'^) are zero 
except for the first and the last principal submatrices of order d. 

Proof of Theorem 1. In the proofs, we will assume that, for any real 
symmetric matrix C of order n, its eigenvalues denoted by Ai(C), . . . , A„(C) 
are ordered from the smallest to the largest. Also note that the unordered 
eigenvalues of a circulant C„(/) are given by f{2TTj/n), j = 1, . . . ,n. Hence, 
it possible that \j{Cn{f)) / f{2TTj/n) for some (or even all) values of j. For 

notational simplicity, we will denote the matrix S'_^S-d by U. 

First, note that the estimate fi is given by (/ + i/5''_(^S'_rf)~^y = (I + 

uU)-^Y [see (2.5)] and Jt = {I + uSi^S-d)'^ l^ = {I + i^Uy^fi. Since ge is 
the spectral density function of the process {et}, the variance-covariance 
matrix of {et :t = 1, . . . ,n} is given by the Toeplitz matrix Tn{ge)- Hence, 
we have 

(A.2) E\\fL - Jlf/n = tr((/ + vUy^Tn{g)) /n. 

The main idea behind the proof is to use approximate S'_dS-d = U and 
Tn{ge) by circulants and then use the well-known theory of circulants to get 
the result. 

Recall that the spectral density function of the error process {et} is given 
by 5e(n) = J2-oo<t<oo P£{^)^~^^^i where the covariances satisfy the condition 

'Zl<t<ooApe{t)\<00. 

Let N = [n/4], the integer part of n/4, and define ^^^(u) = ^\t\<N Pe{^)^~^' 
Then, 

\\geN-9e\\oo< E \pe{t)\<N-^ ^ t\pS)\ = o{l/n) . 

\t\>N \t\>N 

Now, define gsNiu) = max(^j^(n),0). Since gs is a nonnegative function, 
WSeN — S'elloo = o(l/n). From Theorem 8, we have 

\\Tn{ge -9eN)\\ < |be - 9sn\\oo = o{l/n). 

Hence, 

|tr((/ + uUr^TM) - tr((/ + vU)-^Tn{geN))\ 
(A.3) =M{L + vUr^Tn{ge-geN))\ 

< tr((/ + i/?7)-2)||r„(<7, - g,N)\\ = o{l/n) tr((/ + uU)-^) = o(l). 
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Now, using part (c) of Lemma 2, we get 

|tr((/ + uUy^CnigeN)) - tr((/ + iyUr^Tn{geN))\ 

(A.4) < 11(1 + uUrm\Cn{geN) - Tn{g,N)\\l 

<0(1) (t + l)|p.(t)l<oo. 

l<t<Ar 

So, combining (A. 3) and (A.4), we get 

(A.5) tr((/ + lyUy^Tnige)) = tr((/ + i^Uy^Cn{geN)) + 0(1). 

Note that g^ is differentiable because of the assumption Z]i<t<oo < 
oo. Since ge is assumed to be positive on [tTjTt], we can assume that g^jy 
is also positive as N^oo. Now, the eigenvalues (unordered) of the circu- 
lant Cn{geN) are given by gg^i'^'^j /n), and they are all nonnegative. Conse- 
quently, we can define a positive square root of the matrix Cn{geN), and it is 
given by Cn{geNY^'^ = '}2geN{'^T^3 /nY^'^e.je*, where e^-'s are the eigenvectors 
given in Lemma 2. 
Consider the matrices 

B = {I + vUy^Cn{geNf'^ and 

(A.6) 

A=[I + uCn{s'')r^Cn{geNf'\ 

where s{u) = 2 — 2cosu is as defined in Notation 2 at the beginning of this 
section. 

So, from (A.5) and (A.6), we have 
(A.7) tr{{I + vUr^Tn{ge))=HB'B) + 0{l)= (r,{Bf + 0{l). 

Now, note that 

A = B + iy{I + uCn{s)r\U - Cn{s)){I + yCn{geN)r^Cn{geNf'''. 

Lemma 4 tells us Tn{s'^) — U has rank at most 2d. Since Tn{s'^) is a banded 
matrix, by part (b) of Lemma 2 we see that the rank of Cn{s'^) - T„(s'^) 
is at most 2d. So, the rank of Cn{s'^) — C/ is at most Ad. Consequently, the 
rank of the matrix j4 — is no larger than Ad. Now, note that eigenval- 
ues of the matrix A' A = CnigeNY'^il + vCn{s'^))-^Cn{geNY'^ and B'B = 
Cn{geN)^^'^{I + vU)~'^Cn{geNY^'^ are bounded above by a positive constant, 
which is independent of n and v. Hence, by Lemma 3, 

iT{B'B)= J2 <^Mf + 0{l). 
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Note that the unordered eigenvalues of the matrix A' A are (l + i^s(27rj'/n)'^)~^ x 
gsNC^T^j /n), j = 1, . . . ,n. Hence, we can conclude that 

tv{B'B)= Y: <T,{Af + 0{l) 

l<j<n 

(A.8) 

= 9eN{27rj/n){l + us{2'Kj/nf)-^ + 0{l). 

l<j<n 

Note that Qs, and, hence, geNHdeN > 0) are differentiable with bounded 
first derivatives because of the assumption X]i<t<oo < ^^'^ that 

he - 9eN\\oo = o{n-^). From (A.8), we get 

tT{B'B)= Y geN{27Tj/n){l + iys{2TTj/n)'')-'' + 0{l) 
i<i<" 

' gs{27ru/7i){l + us{27:u/7i)'^y^ du + 0{1) 
n(2^)-i r g,{u){l + us{ufr^ + 0(1). 



From the last expression and (A. 2) and (A.8), we get 
E\\fi - Jif/n = rT^ tr{B'B) + 0(l/n) 

(A.9) = (27r)-i r g{u){l + vs{uY)-^ + 0{l/n) 

Jo 

= 7r-i r g{u){l + iys{uf)-'' +0{l/n). 



Now, if we denote 0(m) = {sin(M/2)/(ii/2)}^'^, then we can write 
(A.IO) s{uy = (2 - 2cosn)'^ = 4'^sin(V2)^'^ = u^U{u). 

Making a change of variable z = i/^/^^'^^n, we get 
E\\fi — /xip/n 

(A.ll) =7r-V-i/(2<i) / 5,(zi.-V(2rf))(l + z2rf</,(zi.-i/(2d)))-2^^ 







+ 0(l/n) 



Now, note (/)(0) = 1 and that the function g^ and (p have bounded derivatives 
on [0, vr] . Consequently, the last expression yields 

i^||^_jl||2/^=(2^)-l^-l/(2d)^^(0) / (l + ^2«()-2^^(i + o(z.-V(2rf))) 

Jo 

+ 0{l/n). 
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The proof of this result is now complete, once we note that 

/ (l + z2'^)~2dz = Beta(l/(2d),2- l/(2d))/(2(i). p 

JO 

Proof of Theorem 2. As in the proof of Theorem 1, here, too, we 
will assume that, for any real symmetric matrix C of order n, its eigen- 
values, denoted by Ai(C), . . . , An(C), are ordered from the smallest to the 
largest. Unordered eigenvalues of a circulant Cn{f) are given by f{27rj/n), 
j = 1, . . . ,n, and it is possible that Xj{Cn{f)) / f{2-Kj/n) for some (or even 
all) values of j. As in the proof of the last theorem, we will denote S_i^S^(i 
by U. 

A matrix representation of jl is given in (2.5). As a consequence, we have 
Jl- fi={I + vUy^ii - /" = -^{i + vuy^u^i, 

since 

S^d^^ = (VVd+i> • • • , V^^Un)' = (7d+l, • • • ,7n)', 

where 7j are i.i.d. with mean zero and variance o"^. Hence, 

(A.12) ^11/7 - = z^V^ tr((/ + uUy'^U). 

Let 4'{u) = u/(l + z^u^), when u is a real number. Now, we can write the 
matrix (1 + uU)~'^U as iIj{U). Hence, a re-expression of the relation (A.12) 
is given by 

(A.13) i?||7Z-/i|P = z.V^tr(V'(C/))=z.V2 ^ V(A,(C/)), 

where \i{U), . . . , \n{U) are the eigenvalues of the matrix U . As in the proof 
of Theorem 1, we can now approximate U by the circulant matrix Cn{s'^)- 
Now, note that, for any j = 1, . . . ,n, both il){\j{U)) and il){Xj{Cn{s'^))) are 
bounded above by u~^. Since is an increasing function on [0,oo), we can 
follow an argument similar to the one given in the proof of Theorem 1 to 
show that 

Since the unordered eigenvalues of Cn{s'^) are s{2Trj/n)'^, j = l,...,n, by 
Lemma 2, from the last expression we have 

(A.14) E ^(^i(^))= E H<2nj/nr) + 0{u-'). 
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Noting that the function ij: is bounded on [0, oo), we have 

V il:{s{2-Kj/nf)= / '4>{s{2-Ku/nY)du + 0{u~^) 

l<j<n ^0 

(A.15) =n(2^)-i / ^P{s{u)'^)du + 0{u-^) 



Jo 

From the relations (A. 13), (A. 14) and (A.15), we get 

(A.16) E\\J1 - n\\^/n = z^V^vr-i £ Hs{u)'^) du + 0{u/n^'^). 

If we write s{u)'^ = v?'^(l){u) as in (A. 10), then, by a change of variable z ■ 

i/^/i'^d.)u, we get 

ipis{u)'^) du = r v?'^(l){u){l + uu^'^<i){u))~'^ du 
Jo 

(A.17) =^-1-1/(2^) / z2d^(^^-l/(2d)) 







X (l + z2rf^(^l/-l/(2'^)))~2(iz. 

Since (/)(0) = 1 and (j) has a bounded first derivative, calculations will show 
that 

(A.18) 

^ ' poo 

= / z2'^(l + z2'^)-2fi2(l + 0(j.-V(2rf))). 

Jo 

Combining (A.16), (A.17) and (A.18), we get 

/■oo 

Jo 

This completes the proof of this result, once we note that 

roc 

J z'^'^{l + z^'^)-^dz = Beta{l + l/{2d),l-l/{2d))/{2d). □ 
Proof of Theorem 3. From the proof of Theorem 2, we have 

(A.19) 71 - /i = + uuy'Ufi = -1/(7 + uuy^uinit + Sdoi), 

where U = S'_dS^d- 
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(a) We will assume that the polynomial part is written in the form fiu = 
J2i<j<do~i that the coefficients (3j are constants. The proof for 
the case when Pj are random with finite mean and variance is the same. 
Note that 

E{\\1I - ^^f}/n = n^V2[||(/ + i^uy^Umf 
(A.20) + tr((/ + uU)-^USd,S'a^U{I + uUy^)] 

= n-V2[||(I + uUy^Ufiif + T^-^'^" tv{AA% 
where A = {I + uU)~^U Sdi^. Note that ||5'_d/ii|| =0{n~'^) and, hence, 

11(7 + = o(n-2^)ll(/ + 

(A.21) 

Since S-dSd is a (n — d) x n is a parttioned matrix of the form [0:7] = /, 
where the first matrix in the partition is a (n — d) x d matrix of zeros and the 
second matrix is the identity matrix of order n — d. So the matrix S-dSdo can 
be rewritten as S-dSdSd^^-d = ISd^-d- It is known that the largest singular 
value of SdQ-d is of order n'^o-d jggg Theorem 2 in Burman (2006)]. Hence, 

tiiAA') < tr((/ + iySidS^d)-'s'_dS^diI + uS'^dS^d)-')\\ISdo-df 

= tr((/ + us'^^s^dr's'^dS^dii + i^:s'„rf:s_,)-i)o(n2('^o-<^)). 

The proof of Theorem 2 [(A. 14) through (A. 2. 8)] shows that 

tr((/ + uS'_dS^dr's'_dS^d{I + uSidS^dT') = 0(nz.-i-i/(2^)). 

Hence, we get tr(^74') = 0(n~^"'z^~^~^/(^'^)). Now, combining this result with 
those from (A.20) and (A.21), we get 

^{IIa* - f^f}/n = 0{n~^'^u'^) + 0(ni/-i-i/(2"')) = Oin-^'^u-^). 

(b) Since S^dfJ-i = 0, from (A.20) we get 

E{\\J[ - fif}/n = n" W^n-^'^" tr(yl^')- 

Let 

Cn{s^o°)~= E M27rj/n)-^<^e,e*. 

i<j<"-i 

Then, C„(sq°)~ is generalized inverse of C„(sq°). We will approximate AA' 
by BB', where B = (I + z/C7„(s'^))-iC„(s'^)Cn(s'^o/2)-. Since the rank of 
S'_^^S-dQ — Cn{s'^") is no larger than 2(io, and the rank of Cn(sQ°) is n — 1, 
the rank of Sd^ — Cn{sQ°)~ = S^do^^ ~ CnisQ°)~ is at most 2do + 1. Also, 
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note that the rank of U — Cn{s'^) is at most 2d + 1. So, using Lemma 3, we 
get 

EiWU - /^f = r2i.2n-2<^o iT{AA!) 

We will later show that, in the last expression, the second term involving 
ai{Af' and ai{B)'^ is small in comparison to the first term; that is, 

E{\\-p - /if }/n = r2zy2j^-2rfo tr(SS*)[l + o(l)]. 

Note that the smallest eigenvalue of BB* is zero and the rest of the eigenval- 
ues (unordered) are given by s(27rj'/n)2"'~'^"/(l + i^s(27rj/n)'^)2, j = 1, . . . ,n — 
1. 

So, 

l<i<n-l 

When d>do, an argument similar to the one used in the proof of Theorem 
2 will show that 

= r2(z.i/(2d)/n)2'i"-i 

X Beta((2do - l)/(2(i),2 - (2^ - l)/(2d))/(2(i7r)[l + o(l)]. 

What is left to show is that 

v^n~''''^^{ai{A?+ai{Bf)=o{l){v^''^^''^/nf'''~\ 

We will first prove the case for ai{B)'^ . Recall that the smallest eigenvalue 
of BB* is zero, and the rest of the eigenvalues (unordered) are given by 
ipj = s{2TTj/nf'^-'^o/{l + vs{2T:j/nYf, j = l,...,n - 1. Since the largest 
eigenvalue of BB* is no larger that y-'^+'^o/d^ have that 

u^n-^^^^a^iBf < (z.V(2rf))2'^o ^ o(l)(z.i/(2d)/^)2do-i_ 

Now, let us find the bound for the term ai{A)'^. Let F = S'^iS^i. Calcula- 
tions will show that 

It can be shown that U = S'_dS^d < (-S'^i'S'-i)'^ = F'^, where, for any two 
matrices, the notation "C < means that D — C is nonnegative definite. 
If C and D are nonnegative definite and C < D, then it can be shown that 
(/ + C)~^C < (I + D)~^D. Consequently, the largest eigenvalue of AA' is no 
larger than the largest eigenvalue of (/ + zyF°')"^F°'F"'^o_F'^(/ + uF''') = {I + 
lypd^^-'ip'id-do _ gy Gershgorin's result, one can see that all the eigenvalues of 
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F are bounded above by 4. So, the largest eigenvalue of (/ + zyi^'^)~2^2d-do 
is bounded above by z/-2+do/rf^ Consequently, the largest eigenvalues of AA' 
is no larger that zy-2+(io/d_ Hence, we conclude that 



n 



□ 



Proof of Lemma 1. (a) Since the singular values of H are the positive 
square root of the eigenvalues of H'H, by the Courant-Fischer minimax 
theorem [Theorem 7.3.10 in Horn and Johnson (1985)], the square of the 
jth singular value of H is 



min max x'H Hx. 

ni,...,Mj_i in R" xGR" ,\\x\\=l,x±{ui,...,Uj-i} 



Now, take any vector x in i?" whose first j — 1 components are zero. In the 
case we will be basically concerned with, the principal submatrix of H'H 
consists of the last n — j + 1 columns and rows. Applying Gersgorin's theorem 
on the localization of eigenvalues [see Theorem 6.1.1 in Horn and Johnson 
(1985)], the largest eigenvalue of this principal submatrix is no larger than 



i<s<n '-^ '-^ 
j<t<n l<l<n 



So, we have 



a]{H) < rnax ^ ^ hs+ih+t 

i<s<n — — 
■' j<t<n l<l<n 



< max Y 

l<s<n I 

< max \bs+l\\ 



l+j<t<n+l 

Y h( Y n)< 

(b) Using part (a), we have 



< max 

j;<s<n 



E 1^* 

j+l<t<2n 



E \<^Am< Y E 1^*1 < E 

l<i<" l<i<»ii+l<t<2n \<t<2n 

and this completes the proof. □ 

Proof of Lemma 2. (a) This part follows from the well-known re- 
sults on circulant matrices [see Chapter 4 in Marcus and Mine (1992) or 
Tyrtyshnikov (1996)]. 
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(b) and (c). Let Wn be the orthogonal matrix that has the property 
that, for any x = in iZ", it flips it indexwise; that is, WnX = 

{xn, ■ ■ ■ jXi)' . In turns out that Wn has the form 



(A.22) 





• 




1\ 
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• 1 








• 


• 







• 


• 







1 • 


• 





Vi 


• 




0/ 



Note that the matrix Cn(/) — T„(/), the difference between a ToepHtz matrix 
and its associated circulant, is a 90° clockwise rotation of the matrix Hn{f) + 
WnHn{f)Wn, where Hn{f) is the Hankel matrix with symbol / and Wn is 
the flip matrix as given in (A.22). Part (b) follows from the fact that all the 
elements of Hn{f) + WnHn{f)Wn are zero except for the flrst and the last 
principal submatrix of size N x N. 

Now the singular values of Cn{f) — Tn{f) and Hn{f) + WnHn{f)Wn are 
the same. Since Wn is an orthogonal matrix, the singular values of Hn{f) 
and WnHn{f)Wn are the same. Now, part (c) follows from an application 
of part (b) of Lemma 1. □ 

Proof of Lemma 4. First note that element (j, k) of the matrix Tn{s^) 
is given by (— It is then enough to show that, for any d + 1 < 

j,k <n — d, element (j, k) of the matrix 5'_^5_rf is (— • 
It is not difficult to see that the following identity is valid: 



(A.23) 



E 

o<t</ 



This identity follows from expanding (1 — z)^"^ as J2o<t<2d(.~^Y if) ■ On the 
other hand, we can expand (1 — z)^"^ as 

'd- 



^0<s<d ^ ' ' ^0<t<d 

Note that element (j, k) of the matrix s'_j^S-d is given by 

E (-i)'-(*-,)(-i)'-'(i-0 



l<t<n-d 



(-1) 



E 



l<t<n-d 



d 
t-j 



d 

t- k 



Now, use of the identity (A.23) on the right-hand side of the last expression 
yields the desired result. □ 
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